-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
restart workers after executing 10 tasks to mitigate memory leaks #840
Conversation
✅ Deploy Preview for conda-store ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
@dcmcand @peytondmurray @kcpevey @trallard Could I get a review on this? |
I'm happy to make this configurable via a traitlet as well if desired, but the worker restarts seem happen quickly so I can't see too much down side to just applying it for all conda store deployments. |
I can't speak to the technical aspect of this, but I'm very glad to see it. I see the memory leakage on all my deployments. As far as I can tell it leads to an erroneously failed build due to OOM - which also generally fails with errors that make no sense. All around, this is a big admin management and end user UX improvement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, with this option it looks like celery will just automatically restart the worker after 10 tasks are completed. My one complaint with this is it doesn't strictly speaking fix nebari-dev/nebari#2418 as we are only treating a symptom not the memory usage itself.
But in any case from what I have read it seems like this is exactly the kind of use case the option was intended for, so I guess let's just use it. Is there an easy way to test this? If so it would be nice to ensure that it works.
I agree it does just treat the symptom without fixing the underlying issue. That said, I think this will work robustly.
To test, you could check out this branch, then run conda standalone https://conda.store/conda-store/how-tos/install-standalone or you can run it in [docker containers](docker-compose up --build -d as explained here). After you startup, you can visit localhost:8080/conda-store/admin and rebuild the environment it comes with a few times. If you watch the docker memory usage ( |
Ahh, what I meant was is there a simple way of adding automated testing? |
I am +1 with Peyton here. It is good that we can manually verify (once or so), but I would prefer having some sort of automated tests (akin to load tests) to ensure this is, in fact, a useful fix. |
I've followed up on where the memory leak is occurring using memray. I'm new to memray, but the flamegraph seems to point to line 20 in list_conda_prefix_packages at /opt/conda-store-server/conda_store_server/_internal/action/add_conda_prefix_packages.py which is I'm not sure what that code is doing or why that would cause a memory leak, but I'm recording it here for now. I'll try to find out more later or how to do a more targeted fix. |
Thanks, this is actually enough to start properly narrowing this down to something we can properly troubleshoot. I still think it would be worth merging this feature as it doesn't seem like there's a real cost to this. I'll open another issue with the goal of actually eliminating the memory leak. |
While I can see why So we'd definitely need to do more profiling and mem analysis over time. As per merging, since this PR has been marked as needing tests I think we should still hold until we have a test in place. |
I plan to write what I did to generate the memray flamegraph soon. As far as a test showing the leak, what I'm planning on doing is running a test which builds many conda envs and show that the memory used by the conda store worker docker container continues to rise with each build. We could maybe use the limit_memory decorator in pytest-memray to make the test fail if it uses too much memory, but this would only be possible if we use the pytest-celery setup which I was having trouble with so I prefer the docker tests at the moment. |
@trallard More info about the flamegraph generation in the new issue Peyton opened - #848 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a discussion with @Adam-D-Lewis and @trallard, it seems like this is potentially only observable in docker. For now we will just merge this now without a test as it is low risk, and come back to testing memory consumption later on once we know more.
Fixes #nebari-dev/nebari#2418
Description
This pull request:
Pull request checklist
Additional information
How to test
@trallard edited from: #840 (comment)
To test, you could check out this branch, then run conda standalone conda.store/conda-store/how-tos/install-standalone or you can run it in [docker containers](docker-compose up --build -d as explained here). After you startup, you can visit localhost:8080/conda-store/admin and rebuild the environment it comes with a few times. If you watch the docker memory usage (docker stats) you'll see that the memory usage does not continue to climb with each new build as it does without this option set.